Fault-Tolerant Meshes and Hypercubes with Minimal Numbers of Spares
نویسندگان
چکیده
Many parallel computers consist of processors connected in the form of a d-dimensional mesh or hypercube. Twoand three-dimensional meshes have been shown to be efficient in manipulating images and dense matrices, whereas hypercubes have been shown to be well suited to divide-andconquer algorithms requiring global communication. However, even a single faulty processor or communication link can seriously affect the performance of these machines. This paper presents several techniques for tolerating faults in tl-dimensional mesh and hypercube architectures. Our approach consists of adding spare processors and communication links so that the resulting architecture will contain a fault-free mesh or hypercube in the presence of faults. We optimize the cost of the fault-tolerant architecture by adding exactly k spare processors (while tolerating up to k processor and/or link faults) and minimizing the maximum number of links per processor. For example, when the desired architecture is a d-dimensional mesh and !i = 1, we present a fault-tolerant architecture that has the same maximum degree as the desired architecture (namely, 2tl) and has only one spare processor. We also present efficient layouts for fault-tolerant twoand three-dimensional meshes, and show how multiplexers and buses can be used to reduce the degree of fault-tolerant architectures. Finally, we give constructions for fault-tolerant tori, eight-connected meshes, and hexagonal meshes.
منابع مشابه
Fault-tolerant meshes with minimal numbers of spares
This paper presents several techniques for adding fault-tolerance t o distributed memory parallel computers. More formally, given a target graph with n nodes, we create a fault-tolerant graph with n + k nodes such that given any set of k or fewer faulty nodes, the remaining graph is guaranteed to contain the target graph as a fault-free subgraph. As a result, any algorithm designed for the targ...
متن کاملAdaptive Fault-Tolerant Wormhole Routing Algorithms for Hypercube and Mesh Interconnection
In this paper, we present adaptive fault-tolerant deadlock-free routing algorithms for hypercubes and meshes by using only 3 virtual channels and 2 virtual channels respectively. Based on the concept of unsafe nodes, we design a routing algorithm for hypercubes that can tolerate at least n 1 node faults and can route a message via a path of length no more than the Hamming distance between the s...
متن کاملFault tolerant system with imperfect coverage, reboot and server vacation
This study is concerned with the performance modeling of a fault tolerant system consisting of operating units supported by a combination of warm and cold spares. The on-line as well as warm standby units are subject to failures and are send for the repair to a repair facility having single repairman which is prone to failure. If the failed unit is not detected, the system enters into an unsafe...
متن کاملA Fault-Tolerant Deadlock-Free Multicast Algorithm for Wormhole Routed Hypercubes
In this paper, we propose a novel fault-tolerant multicast algorithm for n-dimensional wormhole routed hypercubes. The multicast algorithm will remain functional if the number of faulty nodes in an n-dimensional hypercube is less than n. Multicast is the delivery of the same message from one source node to an arbitrary number of destination nodes. Recently, wormhole routing has become one of th...
متن کاملFault-Tolerant Adaptive and Minimal Routing in Mesh-Connected Multicomputers Using Extended Safety Levels
ÐThe minimal routing problem in mesh-connected multicomputers with faulty blocks is studied, Two-dimensional meshes are used to illustrate the approach. A sufficient condition for minimal routing in 2D meshes with faulty blocks is proposed. Unlike many traditional models that assume all the nodes know global fault distribution, our approach is based on the concept of an extended safety level, w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEEE Trans. Computers
دوره 42 شماره
صفحات -
تاریخ انتشار 1993